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UNCUSS  IF  IED 


1.  Introduction 


The  term  "exponential  smoothing'1  seems  to  have  been  coined  for 
the  first  time  by  R.  G.  Brown  [l]  in  1959  for  a  particular  time  series 
forecasting  technique  (or  a  statistical  estimation  technique,  depending 
on  one's  paint  of  view.)  Basically,  the  technique  involves  weighting 
each  bit  of  past  history  with  geometrically  decreasing  weights,  less 
and  less  weight  being  given  to  the  older  part  of  the  history.  Certainly 
such  a  procedure  has  a  great  deal  of  intuitive  appeal  and,  moreover,  it 
has  been  shown  that  exponential  smoothing  entails  less  computer  storage 
than  some  of  the  classical  techniques  such  as  forecasting  by  a  moving 
average.  These  and  other  advantages  are  well  documented  in  the  book 
T 4 ;  on  smoothing  by  Brown,  a  book  almost  entirely  devoted  to  the 
exponential  smoothing  technique.  Since  an  inventory  system,  particularly 
under  a  periodic  review  model,  so  often  entails  basing  decisions  for  the 
future  on  past  demand  history,  forecasting  techniques  are  of  considerable 
interest  to  the  inventory  manager. 

It  is  quite  evident  that  exponential  smoothing  has  been  widely 
adopted  by  Naval  Supply  Systems  Command  as  a  basic  forecasting  technique. 
A  review  of  almost  any  document,  such  as  various  ALRAND  reports  and 
PAR  documents  which  involve  forecasting  or  estimation  makes  it  quite 
clear  that  this  is  the  case.  And,  since  the  book  [4]  by  L'rown  is 
practically  a  sole  source  of  information  on  the  subject,  it  is  not 
surprising  to  find  said  book  extensively  referenced  throughout  such 
documents.  The  writer  has  not  been  able  to  find  any  other  text  materials 
in  which  anything  beyond  a  cursory  treatment  of  exponential  r  ’oo thing  is 
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given.  And  yet  this  textbook  by  Brown,  Chapter  9  in  particular,  is 
replete  with  errors  of  both  a  typographical  and  a  conceptual  nature. 

Some  added  difficulty  is  created  by  the  use  of  notation  which  is  not 
consistent  with  the  meaning  usually  given  such  symbols  in  related 
scientific  literature.  For  example,  the  notation  a,  S,  c,  does  not 
always  denote  estimates  of  the  corresponding  parameters  a,  b,  c  as  they 
are  normally  is  used.  In  other  cases,  the  same  symbol  has  been  used 
ambiguously  for  two  different  quantities  which  certainly  leads  to 
confusion. 

One  of  the  biggest  indictments  of  the  material  presented  in 
Chapter  9  of  Brown's  book  is  the  fact  that  his  so-called  Fundamental 
Theorem,  which  hardly  qualifies  a  theorem  to  begin  with,  is  only  an 
asymptotic  (with  time)  result  but  is  presented,  used  and  discussed  in 
such  a  way  as  to  lead  the  reader  to  believe  otherwise.  Indeed,  since 
the  entire  book  rests  basically  on  this  Fundamental  Theorem,  it  is 
not  surprising  that  nearly  every  result  in  the  book  is  an  asymptotic 
result.  This  includes  claims  for  statistical  unbiasedness  which  is 
weak  enough  in  itself  without  holding  only  asymptotically.  Yet,  except 
for  an  occasional  and  casual  use  of  the  phrase,  'after  the  initial 
transient  becomes  negligible,''  the  reader  is  never  made  aware  of  this 
fact. 

Another  fundamental  criticism  from  a  statistical  point  of  view  is 
Brown's  constant  use  of  mean  absolute  deviation  (MAO)  to  estimate 
statistical  variation.  For  the  futility  of  using  MAO  to  account  for 
variability  has  been  well  documented  in  the  statistical  literature  for 
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years.  Its  use  by  Brown  seems  to  be  justified  mainly,  and  not 
surprisingly,  because  of  its  amenability  to  the  exponential  smoothing 
technique.  Out  of  curiosity,  the  writer  did  a  quick  survey  of  the 
recent  literature  on  the  subject  of  variability  and  has  been  unable 
to  find  any  significant  result  that  would  change  one's  attitude 
toward  MAD.  And  yet,  the  disadvantages  associated  with  this  measure 
of  variability  is  not  mentioned  once  in  Brown's  book.  But  there  is  no 
hesitation  in  mentioning  (p.  282)  the  computational  disadvantage  in 
using  the  standard  deviation  as  a  measure  of  variability.  And  of 
course  computational  convenience  is  but  one  of  a  list  of  criteria  to 
be  considered  in  selecting  a  model  and  it  is  a  real  disservice  to  ignore 
other,  perhaps  even  more  important,  criteria. 

The  purpose  of  this  report,  then,  is  to  clarify  some  of  the 
results  given  in  Brown's  book  and  to  emphasize,  much  more  strongly 
than  does  the  author  himself,  the  assumptions,  tacit  and  otherwise, 
that  yield  these  results.  In  this  way,  it  is  hoped  that  the  reader 
will  be  more  aware  of  the  restrictive  nature  of  some  of  the  formulas 
derived  in  Brown's  book  and  will  thereby  exercise  some  caution  in  their 
application.  For  a  special  case  where  Brown's  formulas  are  only 
asymptotically  (in  time)  valid,  alternative  forms  are  presented  which 
are  valid  for  finite  values  of  time  parameters. 

2.  Initial  Conditions 

The  first  matter  to  be  discussed  in  this  report  concerns  the  very 
definition  of  exponential  smoothing.  In  the  first  place,  Brown  seems 
to  be  inconsistent  in  the  definition  employed  in  his  early  papers 
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[  1]  and  [2],  and  the  one  adopted  later  in  his  textbook  [4].  In  the 


former,  single  exponential  smoothing  of  the  sequence  x^, 
is  defined  by, 


x  ,x  , 
1  2 


•  •  • 


I 


x 
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i 

w  (1  -  cy)J 
j«0 


x 

t-j 


t 


+  or  (1  -  a) 


x 

0 


which  may  as  well  be  written 

j 

x  ■  nr  i.  (1  -  or)  x 
t 


since  it  is  identically  the  same.  (The  parameter  0/  is  a  number  in  the 
interval  [0,  ll,  called  the  smoothing  constant.)  This  is  equation  (3) 
page  675  of  [2].  Yet,  on  page  101  of  [4]  we  find  the  symbol  St(x)  used 
to  denote  the  same  quantity  and  this  time  is  defined  to  be, 


j  t 

S  (x)-ort  (l-or)x  +  (1  -  or)  x  . 
t  t-j  0 

j-o 

The  difference,  of  course,  is  in  the  coefficient  of  (1  -  or)  in 
both  expressions  or,  viewed  another  way,  the  difference  lies  in  the 
weight  to  be  given  the  observation  x^.  In  any  case,  both  formulas  are 
claimed  to  be  derived  from  the  basic  recursion  relation, 

St(x)  =  v  xfc  +  (1  -  or)  ^(x) , 

presumably  valid  for  t  *  1,2,3,...,  But  successive  substitution  in 
this  recursion  relation  only  yields 
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t-1  J  t 

S  (x)  -  at  Z  (1  -  a)  xtmi  +  (1  -  <*)  s  (x)  . 

t  j=0  C  J  0 

Clearly,  then,  the  question  of  compatibility  of  these  two  forms  of  the 

definition  of  the  exponential  smoothing  operator  depends  upon  how  one 

defines  the  initial  condition  S^(x) .  If  the  first  formula  is  to  be  valid 

then  we  must  have  S  (x)  *  or  x„  while  if  the  textbook  form  is  used  then 

0  0 

it  must  be  the  case  that  S^(x)  *  x^.  Since  Brown  is  not  explicit  on 
this  point  we  can  only  postulate  what  was  intended.  In  either  case, 
the  resulting  definition  depends  somewhat  on  how  x^  is  treated  since  in 
one  case  x^  is  given  weight  initially  and  unit  weight  in  the  other 
case.  In  the  first  case,  given  in  Brown's  paper,  in  viewing  exponential 
smoothing  as  a  variation  of  averaging  so  that  the  result  is  a  weighted 
sum  of  the  observations,  then  the  sum  of  the  weights  is  not  unity  which 
is  awkward  statistically  speaking. 

Of  course,  how  one  defines  the  initial  condition  is  of  little 
consequence  when  only  asymptotic  results  are  considered  since  the  effect 
of  the  initial  condition  eventually  becomes  negligible  in  either  of  the 
above  cases.  And,  for  this  reason,  the  inconsistency  in  defining  Sq(x) 
(actually  the  utter  lack  of  any  explicit  mention  of  same)  never  appears 
to  be  a  problem  because,  as  we  have  said.  Brown's  results  are^ by  and 
large^only  asymptotically  valid  hence  applicable  only  to  a  steady  state 
condition.  Yet,  the  point  is  more  than  merely  academic.  The  formula 
is  a  result  of  a  recursion  relation  and,  to  apply  such  a  relation  in  a 
model  requires  an  initial  condition  as  does  any  application  of  a 
mathematical  recursion.  Moreover,  statistical  properties,  notably 
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unbiasedness,  definitely  depend  upon  how  one  treats  the  initial 
condition.  Finally,  there  are  many  realistic  situations  in  which  there 
is  simply  not  enough  past  history  to  justify  the  application  of  an 
asymptotic  result  in  which  case  the  initial  condition  becomes  a  very 
important  factor  and  can  considerably  influence  the  consequences. 

Several  points  of  view  regarding  the  meaning  to  be  attached  to 
x^  in  the  sequence  xt,  t  *  0,  1,  2,  . ..;can  be  justified.  If  xfc 

represents  the  demand  occurring  in  the  tfch  time  period  of  an  inventory 
model,  then  it  is  quite  natural  to  define  x^  =  0  since  initially, 

that  is  before  we  begin  operating  the  system,  there  is  no  demand.  In 
that  case,  it  does  not  matter  which  of  the  above  forms  we  use  for  Sq(x) 

since,  in  either  case  we  obtain  S^(x)  2  0  also.  But  then  we  may  as 

well  write 


t-1 


j 


S  (x)  *=  a  ..  (1  -  cv)Jx  , 

*  j-o  £-J 


in  which  case,  writing  p  for  1  -  y,  the  sum  of  the  weights  is 


j  t 

a  1  h  «  1  -  b  , 

j«0 

which  is  not  unity.  One  of  the  consequences  of  this  result  is  that  if 
we  are  observing  a  process  with  constant  mean  then  the  smoothing  operator 
St(x)  is  not  unbiased  as  is  often  claimed  in  such  circumstances.  This 
is  precisely  one  of  the  problems  encountered  by  Bessler  and  the  writer 
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|  8]  in  attempting  to  apply  exponential  smoothing  to  a  dynamic  inventory 
model  originally  developed  by  Vassian  in  1955.  This  led  them  to- define 
a  modified  version  of  smoothing  which  they  call  finite  exponential 
smoothing.  Denoting  this  modification  by  §t(x) ,  it  is  defined  in  [8] 
by 

t  J 

S  (x)  ■  or  Jj  j3  x 
t  t  j«0  t-j 


where 


With  the  coefficients  thus  normalized ,  the  sum  of  the  corresponding 

weights  is  unity  as  desired.  Further  properties  of  this  modified  version 

of  smoothing  and  some  of  its  applications  may  be  found  in  [8]. 

Another  point  of  view  that  might  be  taken  regarding  the  initial 

condition  applies  when  the  assumption  in  the  model  is  that 

x  *  rk  +  e  where  i.  is  a  deterministic  function  of  t  and  e  is  a 
t  t  t  t  t 

2 

random  variable  with  mean  zero  and  constant  variance  o  .  In  that  case, 
it  is  natural  to  suppose  that  x^  ■  5q  +  to  be  consistent  with  the 

rest  of  the  model.  Whether  or  not  such  an  assumption  is  suitable 

depends  upon  further  considerations  in  the  model.  For  example,  suppose  it 

is  assumed  that  'i  =  a,  where  a  r  0,  In  that  case,  S  (x)  is  unbiased 

t  t 

if  we  use  the  version  S  (x)  ■  x  but  is  not  unbiased  if  we  use 

0  0 

S  (x)  ■  of  x  instead. 

0  0 
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In  many  of  the  applications  which  Brown  discusses  in  his  book 

i.4],  he  speaks  of  x^  us  representing  some  initial  --  any  initial  — 

estimate  of,  say  demand,  up  to  the  time  the  process  is  to  be  observed. 

In  some  cases,  such  an  estimate  may  be  sheer  judgment,  or  rather 

guess,  as  to  what  the,  say  constant  mean  demand  will  be.  In  other 

cases,  it  may  be  obtained  from  the  manner  in  which  it  is  hoped  that 

the  process  will  behave.  In  still  other  cases,  Xq  may  be  a  number 

which  depends  upon  some  related  process  whose  behavior  has  been 

previously  observed.  In  any  case  we  are  then  considering  x^  as  being 

an  estimate  from  a  separate  distribution,  one  not  necessarily  related 

to  the  assumption  x^  ■  §  +  e  .  Then  S Ax)  is  or  is  not  unbiased 

t  t  t  t 

depending  upon  both  the  distribution  that  does  represent  x^  as  well 
as  which  form  of  SQ(x)  we  use.  For  example,  if  ■  a  for  t  ■  1,2,... 
then 

E[  St(x)  ]  ■  a  -  a  +  0*  EC  x^  ] 

if  we  take  S  (x)  a  x  while 
0  0 

E[  S  (x)  ]  *  a  -  a  0  +  or  0  E[x] 

t  0 

if  we  take  S  (x)  «  o  x  .  In  either  case,  whether  or  not  e[  S  (x)  ]  =  a 
0  o  t 

depends  upon  e[  x^  ]  and  certainly  in  general  it  will  be  the  case  that 

E_  St(x)  ]  ?  a. 
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3.  Fundamental  Theorem 


As  indicated  earlier,  most  of  the  mathematics  of  exponential 
smoothing  is  summarized  in  what  Brown  calls  his  Fundamental  Theorem  of 
Exponential  Smoothing,  the  statement  and  'proof"  of  which  is  given  on 
page  133  of  [4].  Using  the  model  x^  *  ^  +  e^  where,  in  general, 


2  2 

§  *  a  +  a.t  +  1  t  +  ,, 

t  0  1  2 


n  n  ,  ,  n 

.  +  —T~t  and  1  j 

n!  t  t  ■  0 


represent  independent  random  variables,  identically  distributed  with 

2 

zero  means  and  constant  variance  o  ,  Brown  asserts  that  his  fundamental 
theorem  'proves  that  it  is  possible  to  estimate  the  n  +  1  coefficients 
in  an  nth  order  polynomial  model  by  linear  combinations  of  the  first 
(n  +  1)  orders  of  exponential  smoothing.1'  The  general  kth-order 
smoothing  operator  is  defined  inductively  by 


Lki 

S  (x)  *  0/  S 
t  t 


[k-1]  [k] 

(x)  +  (1  -  o)  S  ^  (x)  for  t  ■  1,2,3,... 


In  the  first  place,  the  fundamental  theorem  is  not  really  a  theorem 
at  all  but  simply  an  observation  that  the  p^-order  smoothing  operator 
can  be  written  explicitly  in  terms  of  the  coefficients  of  the  model. 

But  worse,  what  is  stated  as  the  fundamental  theorem  is  simply  not 
true.  Thus,  even  for  p  ■  1  it  is  just  not  true  that 


.  x  00 

k  xt 

S  (X)  -  L  (-1)  —  a 

c  k-0  *' 


n 

V 


t  _  v  k  -j 


j«0 


j  3 
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as  asserted  by  the  theorem.  Later  in  this  section,  we  will  derive 
the  correct  expression  for  S^(x)  and  show  that  what  is  given  here  is 
an  approximation. 

Secondly,  even  if  one  were  to  call  the  result  a  theorem  in  a 
broad  sense,  the  proof  that  is  given  is  not  a  proof  of  the  statement 
of  the  theorem  at  all.  Indeed,  the  opening  line  of  the  proof  on 
page  133  asks  the  reader  to  "Think  of  the  infinite  sequence  of 
observations, ... ,x^  for  t  *  -1,0,1,.../°."  But  one  is  not 

given  an  infinite  sequence  of  observations.  In  fact,  all  that  is  given 

for  any  application  are  the  observations  x  ,x  ,x  , ...,x  .  Giving  the 

0  1  2  t 

author  the  benefit  of  the  doubt,  however,  let  us  suppose  that  the 

extra  variables,"  are  simply  being  used  as  surplus  variables  to  generate 

a  proof.  Certainly  the  observations  x  ,,x  ,...  turn  out  to  be 

t+1  t+2 

redundant  for  we  find,  reading  further,  that  a  new  sequence  is  introduced 
by  the  definition 

,0  if  t  <  0 
st  *  i  t 

if  t  2f  0 

whereupon  it  is  asserted  that 


00 

S  (x)  -  T  X  S 
t  .  n  t-j  j 

j«0 


found  by  the  convolution  of  {  xj  and  {  S  3. 

j  j«  -«  k  k» 


Thus,  the  effect  of  defining  S  ,S  , ...  to  be  zero  is  to  cancel  out  the 

-1  -2 
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observations  x  ,x  , . . .  in  writing  the  convolution  product  given 
t+1  t+2 

in  the  text.  But  what  remains  is,  after  correcting  a  misprint  on 
page  133,  given  by 

*  j 

S  (x)  ■  0/  'L  (3  x 
*  j-0 

and  this  is  not  the  definition  of  S  (x)  although  the  author  certainly 

t 

uses  the  same  symbol  and  refers  to  this  as  the  single  exponential 
smoothing  operator. 

What  possible  points  of  view  can  be  taken  to  resolve  this 
apparent  inconsistency?  One  approach  would  be  to  assume  the  author 
intended  to  define  S  by  means  of 

,  »  3J  If  0  s  j  <  t 


0  otherwise 

Or,  we  might  assume  that  the  extra  variables  are  all  zero,  that  is, 

x  5  0  if  n  <  0.  In  either  case,  convolution  would  then  yield  the 

n 

formula 

j 

S  (x)  *  ot  \  b  x 
t  j*0  t-J 

which  is  consistent  with  the  fact  that  we  will  be  estimating  with 

observations  x  ,x  ,...,x  .  Unfortunately,  this  formula  is  still  not 
0  1  t 

quite  the  same  as  that  given  previously  in  the  text  on  page  101  where 
St(x)  is  defined.  There,  the  coefficient  of  x^  is  given  as  whereas 
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t 

here  in  the  fundamental  theorem,  the  coefficient  of  x  is  ry  b  under 

0 

any  of  the  above  versions. 

A  third  criticism  is  that  the  theorem  does  not  prove  (even  if  it 
were  valid)  that  the  coefficients  in  the  model  can  be  estimated  by 
linear  combinations  of  S^^(x),  S  ^^(x),...,  S^n+^(x)  as  quoted 
above.  There  is  still  the  question  of  solving  the  system  of  equations 
given  by  the  theorem  for  the  coefficients.  The  author  proceeds  to  do 
this  for  two  special  cases  in  the  remainder  of  the  chapter.  But  even 
so,  we  are  compelled  to  remark  that,  of  course  it  is  possible  to 
estimate  the  coefficients  this  way.  Indeed  one  can  use  any  function 
of  the  observations  to  estimate  them.  But  for  any  estimates  to  be 
meaningful  they  should  satisfy  some  criteria,  at  least  from  a  statistical 
point  of  view.  Are  the  estimates  presented  by  the  author  unbiased?  We 
have  seen  that  in  general  they  are  not.  For  the  special  case 

■  a^  +  a^t,  the  estimates  given  are  certainly  not  least  squares 
nor,  if  normality  is  assumed,  maximum  likelihood  since  these  estimates 
are  well  known  and  are  not  the  same.  One  of  the  few  criteria  claimed  to  be 
satisfied  and  shown  by  D'Esopo  [3]  is  that  the  estimates,  not  surprisingly, 
minimize  "exponentially  discounted  least  squares,"  i.e.,  minimizes  the 
quantity 

00  ‘  2 

y  pJ  (x  -  p  ) 
j=0  t-j  t-j 

at  least  among  polynomial  fits.  Such  a  ground  rule  for  deriving  estimates 
is  not  conventional,  however,  and  is  tantamount  to  selecting  an  estimate 
by  fiat. 
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It  might  be  instructive  to  see,  in  contrast  to  what  appears 

in  Brown's  fundamental  theorem,  what  the  precise  results  are  at  least 

for  the  special  case  of  u  linear  model*  In  order  to  maintain  the 

same  notation  as  Brown  we  will  assume  a  deterministic  model  at  first 

so  that  we  suppose  *  a  +  bt,  t  ■  0,1,2,...,  Brown  is  not  explicit 

on  this  point,  continually  confounding  the  original  random  model  with 

the  deterministic  version  whenever  it  suits  his  purpose.  We  will  be 

careful  to  always  make  this  distinction,  however,  so  that  estimation 

can  be  discussed  in  its  proper  contexts  while  analytic  operations  are 

only  performed  on  deterministic  quantitios  to  which  they  should  be 

,  (°) 

restricted.  We  then  have,  in  Brown  s  notation,  x  *  a  +  bt  and 

(1) 

xk  ■  b.  Since  two  versions  of  S  (x)  exist  even  in  the  same  context 
t  t 

for  finite  t,  we  will  have  to  make  a  choice  of  definitions.  Here  we 
will  assume  that  the  definition  Sq(x)  ■  x^  is  to  be  preferred  since, 
than,  the  sum  of  the  weights  will  be  unity  in  the  version 

t-1  k  t 

S  (x)  -  3  x  +  3  x  . 

f  k-0  t q 

Also,  double  smoothing  can  then  be  written 


[2]  t-1  k  t 

S  (x)  •  a  Z  p  S  (x)  +  3  S  (x). 
t  k-0  t-k  0 

Here  we  have  made  the  natural  assumption  that 

[21 

S0  (x)  -  SQ(x) , 


13 


In  order  to  derive  the  finite  analogues  of  Brown's  fundamental 


theorem,  it  is  only  necessary  to  substitute  in  these  formulas  and 

simplify  the  resulting  algebra.  The  simplification  is  assisted  by 

a  knowledge  of  finite  expansions  functions  of  the  basic  geometric 
t  k 

progression  L  3  .  For  the  record,  the  first  three  of  these 
k-0 

expansions  are  given  below.  They,  and  others,  can  easily  be  derived 
by  successively  differentiating  with  respect  to  the  continuous 
variable  3  (0  <  3  <  1)  and  simplifying  the  resulting  algebra  • 


t  k 
I  3 
k-0 


t  +  1 


1-lJl 


a 


(3-1) 


t  k 
Z  k  0 
k-0 


t+1  t+2 

3  -  (t+1)  0  +  t  0 

2 


a 


t 

>:  ki 

k-0 


'i  2  t+1  2  t+2  2 

3  +  0  -  (t+1)  3  +  (2t  +  2t-l)  0  -  0 


t+3 


a 


From  the  above  definition  and  assumptions  we  then  have 


t-1  k  £  t-1 

S  (x)  -  ar  0  (a  +  b  (t-k))  +  a  0  -  a  (a  +  bt)  Z  0 
c  k*0  k-0 


t-1 

-0-01  k  0  +  a  0. 

k-0 
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After  some  simplification,  we  obtain, 


(0)  0  b 

(3-2)  S  (x)  “x  -  b  -  + -  0 

1  1  a  a 


t+l 


Likewise,  substituting  in  the  formula  for  double  smoothing  yields, 


[2]  <°>  B  b  t+1  t+l 

(3-3)  S  (x)  *  x  -  2b  — -  +  2  —  b  +  bt  b 

t  t  a  a 

[2] 

These  are  the  exact  formulas  for  S  (x)  and  S  (x)  ,  valid  for  all 

t  t 

finite  t,  and  of  course  they  differ  from  those  given  by  Brown. 

It  is  now  apparent  how  one  can  derive  Brown's  results  as  asymptotic 

t+l 

versions  of  the  exact  cases.  Since  0  <  0  <  1,  we  have  0  — >0  and 

t+l 

t  p  — ?-0,as  t-*00.  Then  we  may  say  that, for  sufficiently  large 

[2] 

values  of  t  .we  may  approximate  S  (x)  and  S  (x)  by, 

/  t  t 


(0) 


(3-4) 


S  (x)  *  x 
t  t 


Lx 

a  t 


(1) 


[2]  _  (0)  j)  (1) 

S.  (x)  «  X  -  2  —  x 
c  tot 


These  are  the  formulas  one  would  obtain  from  substituting  into  the 
Fundamental  Theorem  of  page  133. 

To  actually  apply  these  results  and  evaluate  them  statistically,  we 

would  want  to  consider  the  model  x  ■  C  +  e  where  ■  a  +  bt  and, 

t  t  t  t 
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2 

as  before,  e(  has  mean  zero  and  variance  a  .  Brown  would  have  us  use 

as  estimates  based  on  the  data  x  ,x  ,  ...,x  ,  the  quantities, 

0  1  t 


(3-5) 


„  (0)  [2] 

x^  -  2S  (x)  -  (x) 


a  (!)  cy  ^ 

xt  -  —  [  St(x)  -  (x)] 


These  are  easily  obtained  by  solving  (3*4)  as  though  they  were  equations 

and  then  replacing  x  ^  and  x  ^  by  the  symbols  x  ^  ^  and  x  ^  since 

t  t  t  t 

they  involve  or  are  themselves  unknown  parameters.  Whatever  means 

they  are  arrived  at,  certainly  they  are  properly  called  estimates  since 

they  are  functions  of  the  data  x  ,x  ,...,x  .  They  are  not,  however, 

V  1  t 

unbiased  as  Brown  claims  if  one  uses,  as  one  should,  the  precise  formulas 

[2l 

for  S^(x)  and  ‘(x). 

To  see  that  the  estimates  are  biased,  we  notice  first  that 


E[  ;  <0)]  -  2E[  S  (x)  ]  -  g[  S  [2l(x)  ]. 
t  t  t 


But, 


t-1  k  t 

S  (x)  *  or  p  x  +  3  x„ 
t  k*0  t-k  0 


and,  since  e[  x  ]  ■  a  +  b(t-k),  we  have, 


t-i  k  , 

B[  S  (x)  ]  *  a  b  (a  +  b(t-k))  +  a  P 
t  k-0 
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which  is  Che  same  expression  we  dealc  with  in  Che  deterministic  model 
(the  S  (x)  of  that  model).  From  that  result,  we  have 


E[  St<x)  ]  ■  a  +  bt 


O'  Of 


Similarly, 


E[  S 


[2] 


(x)  ] 


d  .  t+l  t+1 

a  -{-  bt  -  2b  —  +  2  ~  0  +  bt  0 

ot  Of 


Putting  these  facts  together  we  thus  obtain, 


E[  x  <0)]  -  a  +  bt  -  bt  dt+1 
(3-6)  fc 

(1)  t  t 

E[x  ]  “  b  -  b  0  -  or  bt  0 

t 

In  both  cases,  the  estimates  are  biased  downward,  with  a  bias  that  is 
a  function  of  the  'trend"  b.  Since  b  is  unknown,  the  bias  may  be 
serious  depending  oi  course  on  the  magnitude  of  b.  The  bias  factors 
do  converge  to  zero  as  time  increases  beyond  bounds  however,  and  we 
may  say  that  the  estimators  Brown  gives  ere  thereby  asymptotically 
unbiased. 

For  the  case  n  *  2,  that  is  for  a  quadratic  model 
a9  2 

„  =a  +  a  t  +  — t  ,  similar  conclusions  can  be  reached.  The 

t  0  1  2 

algebra  involved  is  somewhat  burdensome,  however,  and  will  not  be 
repeated  here.  Suffice  it  to  say  that  the  exact  formulas  for 
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[2]  [3] 

S  (x)  ,  S  (x)  and  Sf  (x)  are  such  that  for  t  sufficiently  large, 
t  t  c 

Brown's  versions  of  these  expressions  hold.  Again,  if  these 

approximations  are  treated  as  equations,  one  can  solve  the  resulting 

system  for  the  derivatives  x  x  ^  ^  and  x  ^  to  obtain  Brown's 

t  t  t 


results.  When  treated  as  estimates  they  are  not,  of  course,  unbiased 
any  more  than  the  linear  case.  Also,  the  unsuspecting  reader  should 
be  warned  that  the  results,  published  on  pages  140  through  144  should 
be  read  and  interpreted  with  caution  even  after  correcting  some  obvious 
misprints.  Thus,  on  page  140  for  example,  a^(t)  and  a^(t)  are  not, 

as  one  might  presume  from  the  model,  estimates  of  a  and  a  but  rather 

(0)  a2  2  (1)  °  1 

estimates  of  x  (t)  *  a^  +  a^t  +  -=  t  and  x  (t)  ■  a^  +  a  t  ^ 


respectively.  Happily,  of  course,  a  (t)  does  happen  to  be  an  estimate 

2 

(2) 

of  a„  since,  for  this  case,  x  (t)  ■  a  . 

2  2 

No  attempt  was  made  to  examine  the  results  for  higher  order 
polynomials.  Based  on  the  quadratic  model,  it  is  clear  that  the  algebra 
involved  would  be  too  unwieldy  to  make  the  task  practical.  Perhaps  this 
is  as  good  a  justification  for  resorting  to  asymptotic  results  as  any. 
And  it  should  be  stated  that  there  is  no  serious  objection  to  deriving 
asymptotic  results  and  considering  estimators  with  only  asymptotic 
properties.  The  objection  is  to  the  inordinate  use  of  the  same  notation 
for  the  finite  case  and  the  asymptotic  case  in  formula  after  formula. 
Together  with  a  complete  lack  of  any  discussion  of  the  difference,  it 
leads  the  unsuspecting  reader  to  believe  that  the  results  are  stronger 
than  they  really  are. 


18 


4.  Mean  Absolute  Deviation 


In  inventory  applications  of  random  demand  models,  safety  levels 
are  often  determined  in  terms  of  some  measure  of  variability,  usually 
the  common  standard  deviation  of  the  demand  distribution.  As  was 
mentioned  in  the  introduction,  Brown  prefers  to  use  mean  absolute 
deviation,  or  MAD  for  short.  This  in  spite  of  the  statistical  grounds 
for  not  using  this  particular  measure.  As  he  points  out  (page  275) 
the  mean  absolute  deviation  is  proportional  to  the  standard  deviation 
in  any  probability  distribution.  Both  are,  after  all,  functions  of  the 
parameters  of  the  distribution.  But  finding  an  appropriate  estimate 
for  MAD  and  deriving  the  corresponding  distribution  theory  to  guarantee 
the  required  probability  for  safety  levels  is  quite  another  matter. 

Brown  has  not  done  this  and,  to  make  matters  worse,  never  distinguishes 
between  a  population  or  true  MAD  and  an  estimate  thereof,  even  to  the 
point  of  using  the  same  symbol  and  name  for  them. 

In  the  first  place,  the  definition  adopted  by  Brown  for  MAD,  denoted 
reduces  to  u  *  eL  |  x  -  n  |l  where  x  is  any  random  variable  having 
mean  p>.  As  he  himself  points  out  on  page  283  it  would  be  better  to 
define  u  as  e[  j  x-m  |]  where  m  is  any  median  of  the  distribution  of  x. 
This  is  because  E[  |  x-c  |]  is  minimized  by  choosing  c  ■  m.  Yet  he 
ignores  this  criterion  and  uses  p.  instead  of  m,  justifying  his  choice 
on  tue  basis  that  forecasts  estimate  means  rather  than  medians.  But 
if  one  can  justify  computing  A  instead  of  o  because  A  is  proportional 
to  o,  surely  the  same  argument  can  be  used  to  estimate  m  instead  of  p. 
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This  is  hardly  a  convincing  reason  but  we  will  pass  this  point  and 
use  Brown's  definition.  Of  course,  in  a  symmetric  distribution  p.  ■  tn 
as  he  brings  out.  But  it  is  precisely  in  the  applications  to  random 
demand  that  skewed  distributions  such  as  the  Poisson  and  Negative 
Binomial  families  arise  in  practice.  This  is  especially  pertinent 
to  standard  assumptions  in  Naval  supply  systems. 

Brown  quite  aptly  shows  that  the  ratio  of  A  to  a  is  approximately 
0.8  for  the  Normal,  Exponential,  Uniform  and  Triangular  families  of 
probability  distributions.  Yet,  except  for  the  normal  family,  the 
interest  must  be  primarily  academic  so  far  as  inventory  applications 
are  concerned.  It  would  be  far  more  interesting,  and  quite  instructive, 
to  see  what  the  situation  is  for  other  distributions.  In  particular, 
an  examination  of  the  Poisson  family  reveals  that  0.8  can  be  a  very 
poor  approximation.  In  the  roisson  mass  function 

-A  * 

p(x;  X)  -  e  ^ —  , 

x! 

x  ■  0,1,2,...  with  0  <  X  <  1, 
we  h<i  ve 

00 

h  -  I  i  X  -  X  I  p(x;  X) 
x»0 

-X  «  -X  x  -X  ®  -X  x  ®  -X  x 

*  Xe  +  7  (x  -  X)e  -  ■  Xe  +  L  xe  —  -  X  I.  e  It _ 

x»l  x!  x“l  x!  x»l  x! 

-X  m\  -X 

■Xe  +X-X(l-e)«2Xe  . 
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Since  a  ■  V  X,  we  have  —  *  2'J  X  e  .  Values  of  this  ratio  are  shown 

o 

for  a  variety  of  values  of  X  in  Table  1* 


X 

K9 

0.05 

B9 

m 

R9 
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0.99 

A 

o 

0.198 

0.425 

i 

0.572  { 

0.779 

0.858 

0.818 

0.771 

0.739 

TABLE  1.  Ratio  —  for  Poisson  family 

o 


As  is  evident  from  the  table,  the  approximation  0.8  is  extremely  poor  for 
slow  moving  items  where  the  Poisson  with  small  mean  X  is  a  typical 
assumption.  For  values  of  X  >  1  in  the  Poisson  family  and  the  geometric 
distribution  with  mean  greater  than  unity,  a  similar  analysis  shows 
that  the  approximation  0.8  is  not  bad,  however. 

This  may  appear  to  be  a  minor  academic  point  until  one  finds  that 
2 

the  same  ratio  of/  is  used  in  the  applications  of  Chapter  20  quite 
independent  of  any  assumption  as  to  the  underlying  probability 
distribution  of  demand.  Also  we  might  point  out  that  even  though  A  is 
proportional  to  o  in  the  population,  it  does  not  follow  that  the 
estimates  A  and  o  enjoy  the  same  sort  of  relationship.  This  would 
imply  a  type  of  invariance  principle  such  as  that  enjoyed  by  maximum 
likelihood  estimates,  and  is,  in  general,  not  true  when  the  estimates 
are  not  maximum  likelihood. 

This  brings  up  another  matter  concerning  MAD  estimates.  Brown 
uses  error  forecasts  to  estimate  A.  In  fact,  for  the  particular  data 
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xQ,x^, . . . ,x^,  the  error  forecast,  e(t)  is  defined  by  e(t)  ■  x^  - -x 

where  x  .  is  taken  to  be  the  forecast  at  time  t-1  of  the  demand  at 
t-1 

time  t.  Now  in  our  basic  model  with  constant  mean,  5  ■  a,  and 

exponential  smoothing  used  to  estimate  the  mean,  we  have 


x 

t- 


1 


t-2 

n  .. 

k**0 


.  k 
P  x 


t-l-k 


t-1 

+  b  x 

o 


and  if  e[  xrt  1  *  a,  e[  x  ]  ■  a.  It  then  follows  that  e[  e(t)  ]  *  0 
0  t-1 

2 

and,  from  independence,  the  variance  cr  (t)  of  the  error  forecast 

e 

becomes 


2  2  2t-2  2  2t-2  , 

a  (t)  -  a  +  — —  (1  -  b  )a  +  B  cr. 

1  +  b 

as  can  be  easily  verified.  Letting  t— we  observe  that  the  limiting 
2 

variance  3  is  given  by 
e 

o  2  ■  (1  + - 2! - )  a2  - - - -  a2 . 

e  1  +  B  2  -  a 

a  formula  which  is  used  throughout  the  text  by  Brown  as  though  it  were 
valid  for  all  t.  Incidentally,  if  there  is  a  possibility  of  trend 
present  so  that  the  assumption  of  constant  mean  is  suspect,  not  even 
this  asymptotic  formula  should  be  used  to  describe  the  variance  of 
forecast  error. 
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Granted  that  t  is  sufficiently  large  so  that  the  above  asymptotic 

variance  applies,  it  would  follow  that  the  true  MAD  for  e  ,  say  A  , 

t  “ 

would  be  defined  by  e[  |  efc  |]  since  fi[  e^  ]  "0.  Then  jLf  it  were 

,  /  n 

true  that  A  ■  V  a  .as  for  a  normal  distribution,  it  would  then 
e  e) 

follow  that  A  *’V^_  —  2  a  as  Brown  claims.  Then  of  course 

e  "O' 

"  2-Q- 


a 


H~\  i  2-0  A 
2  V  2 


and  if  we  can  estimate  A^,  we  could  then  estimate  a  by  invoking  an 
(unproved)  invariance  principle  obtaining 


In  other  words,  if  o  is  the  usual  maximum  likelihood  estimate  of 
o  for  the  present  assumption,  it  follows  from  the  invariance  principle 
that 


A 

e 


-1...  o 
2-o 


is  the  maximum  likelihood  estimate  of  Ae>  We  are  on  safe  grounds, 

statistically  speaking.  Now,  a  reasonable  estimate  of  A  based  on 

e 

ttie  sample  e„,en,...,e  and  the  fact  that  e[  e  ]  ■  0  would  be  the 
1  2  t  t 

J 1 ,  namely,  -  ^ 

t  i-1 

guided  by  exponential  smoothing,  uses  instead  the  estimate 


sample  analogue  of  e[  |  e 


Brown,  however, 
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Thus,  apart  from  the  initial  condition,  A  is  an  exponentially  weighted 

e 

average  of  the  same  variables  |e^|,  | e^ !»•*•»  |e  |,  which  makes  it  about 

twice  removed  from  any  known  distribution  theory.  If  A  is  used  in 

c 

the  above  formula  for  a  ,  what  can  be  said  about  the  resulting  estimate? 
It  is  definitely  not  maximum  likelihood.  Neither  is  it  unbiased  nor 
likely  to  be  minimum  variance.  In  truth,  without  some  knowledge  of 
the  distribution  of  even  under  normality  assumptions,  very  little 
can  be  said  about  o. 

In  summary,  then,  there  is  a  definite  need  for  more  distribution 
theory  before  a  strong  case  can  be  made  for  exponentially  smoothed 
estimates  of  MAD.  Brown  claims  on  page  2b6  that,  “If  one  can  estimate 
the  mean  absolute  deviation  of  the  forecast  errors,  it  is  quite  simple 
to  infer  the  probability  that  any  given  multiple  of  the  estimated  value 
will  be  exceeded.  ‘  Quite  the  contrary,  however,  it  is  not  only  difficult 
hut  practically  impossible  to  infer  such  probability  statements  without 
a  knowledge  of  the  distributions  involved.  For  example,  even  if  x  is 
normal  with  mean  u  and  variance  a  so  that  for  any  0  <  y  <  1  we  can 
compute  the  value  of  K  such  that 


Y-Pr[x>n+Ka] 

it  does  not  follow  that  when  we  estimate  M-  by  exponential  smoothing,  say 
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and  c  by  V““  A  ^  that  Pf  [x  ^  p.  +  K A  ]  is  still  Y. 

Yet  this  seems  to  be  tacitly  implied  at  several  points  of  the  book.  At 

the  very  least,  one  should  have  some  simulation  results  for  the 

distribution  of  u  +  K  i  —  A  to  make  the  result  more  plausible,  as 

1  2 

recommended  by  Asher  and  Wallace  [6].  As  they  point  out,  if  the  usual 
Gauss-Markov  assumptions  are  made,  MAD  or  any  estimator  other  than  least 
squares  will  come  off  second  best.  The  results  of  their  study  show  that 
MAD  is  about  20%  efficient  compared  to  minimum  variance  estimators  and 
also  displayed  greater  bias. 


5.  Conclusions  and  Recommendations 

Lest  this  report  be  taken  as  a  total  indictment  of  exponential 
smoothing  as  a  forecasting  technique,  let  it  be  said  that  it  is  freely 
admitted  that  this  idea  of  weighting  the  past  with  ever-decreasing 
weights  has  a  great  deal  of  intuitive  appeal.  And  it  is  granted  that  the 
technique  has  a  computational  advantage  in  requiring  loss  computer 
storage  than  more  standard  techniques.  Carried  to  its  extreme,  however, 
one  could  equally  well  justify  using  only  the  current  observation  for 
estimation  purposes  and  ignore  the  past  completely.  At  least  such  an  estimator 
would  possess  some  well  known  statistical  properties. 

And  this  is  one  of  the  points  we  wish  to  stress.  An  estimator,  to 
be  valuable,  must  satisfy  various  criteria  that  have  been  used  to  judge 
such  estimators.  Exponential  smoothing,  regardless  of  its  intuitive 
appeal,  must  be  able  to  stand  the  test  alongside  other  alternatives. 

Invariably,  this  involves  some  knowledge  of  the  probability  distribution 
of  estimators.  Without  such  a  knowledge,  it  is  difficult  to  approve 
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or  disapprove  heartily  of  exponential  smoothing.  Certainly  Brown  has 
not  developed  such  theory  and  neither,  apparently,  has  anyone  else  to 
any  extent.  Lacking  such  a  theory,  a  recent  study  by  Astrachan  and 
Sherbrooke  [7]  involved  an  empirical  test  of  exponential  smoothing. 

The  results  showed  that  exponential  smoothing  was  not  significantly 
better  than  techniques  currently  being  used. 

But  even  if  these  statistical  points  were  resolved  we  would  have 
to  object  to  the  way  in  which  the  results  are  presented  in  Brown's 
book  for  reasons  clearly  detailed  in  this  report.  To  this  end  we 
are  inclined  to  agree  with  the  review  of  the  book  done  for  Operations 
Research  (Vol.  13,  No.  2)  by  Fishman  who  says,  "In  assessing  the  over-all 
contribution  of  this  book  to  the  forecasting  literature,  I  would  argue 
that  it  confuses  rather  than  enlightens  the  well-informed  as  well  as 
the  mathematically  unsophisticated  reader."  The  writer  would  add  that 
even  the  mathematically  sophisticated  reader  may  have  considerable 
difficulty  unravelling  some  of  the  ambiguity  present  in  various  formulae 
as  well  as  justifying  several  claims  to  mathematical  rigor.  In  any  case, 
the  user  of  this  book  should  be  aware  of  the  asymptotic  nature  of  the 
results  and  apply  them  with  this  restriction  in  mind. 

Finally,  we  have  seen  that  the  indiscriminate  use  of  mean  absolute 
deviation  as  a  measure  of  statistical  variation  creates  the  same 
theoretical  problems  that  have  caused  it  to  be  abandoned  by  statisticians 
these  many  years.  As  Asher  and  Wallace  [6]  put  it,  "...  one  should  be 
prepared  to  give  up  considerable  efficiency. '  The  difficulties  of 
obtaining  probability  distributions  for  MAD  estimators  introduced  by 
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Brown  appear  to  be  extremely  difficult  at  best.  We  re-emphasize  the 
fact  that  such  estimators,  as  well  as  any  exponential  smoothing 
estimators,  must  be  more  than  a  means  of  arriving  at  a  number,  ease 
of  computation  notwithstanding.  Perhaps  the  variance  estimation 
techniques  we  have  criticized  in  this  report  are  fruitful.  But  without 
some  knowledge  of  the  theory,  and  their  probability  distributions  in 
particular,  there  simply  is  no  way  to  pass  judgment  on  them. 

As  for  further  research,  the  areas  we  have  been  discussing  offer 
rich  opportunities  indeed.  Since  this  report  has  essentially  been 
devoted  to  a  critique  of  Brown's  book,  it  is  perforce,  negative  in  its 
spirit  and  conclusions.  A  more  positive  approach  would  be  to  define 
alternative  procedures  which  would  be  as  appealing  as  smoothing  for 
computing  purposes  and  would  admit  a  statistical  theory  at  the  same 
time.  This  is  especially  needed  for  statistical  variation  to  replace 
MAD  as  a  means  of  determining  safety  levels.  It  is  strongly  recommended 
that  further  research  in  this  specific  direction  be  undertaken.  It  may 
very  well  turn  out  that  the  smoothing  procedures  are  actually  close  to 
optimal  in  some  sense.  But  it  needs  to  be  established  that  they  are. 

It  does  not  appear  feasible  to  develop  formulas  for  exponential 
smoothing  beyond  the  quadratic  model.  The  algebra  involved  is  simply 
too  unwieldy.  Perhaps  it  might  be  wise  to  reiterate  at  this  point  that 
we  have  no  objection  to  asymptotic  results  as  long  as  they  are  clearly 
labeled  such.  Indeed,  for  higher  order  polynomials  it  appears  necessary 
to  resort  to  such  limiting  results.  Another  possible  area  of  research 
would  thus  be  to  investigate  further  the  statistical  properties  of 
Brown's  asymptotic  formulae. 
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